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Software Engineering (SE) course is one of the backbones of today's 
computer technology sophistication. Effective theoretical and practical 
learning of this course is essential to computer students. However, there are 
many students fail in this course. There are many aspects that influence a 
student's performance. Currently, student performance analysis methods just 
focus on historical achievement and assessment methods given in the class. 
Need more research to predict student's performance to overcome the 
problem of student failing. The objective of this research is to perform a 
prediction for student's performance in the SE using enhanced Multilayer 
Perceptron (MLP) machine learning classification with Adaboost. This 
research also investigates the requirements of each student before registering 
in this course. This research achieved 87.76 percent accuracy in classifying 
the performance of SE students. 
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1. INTRODUCTION 

Software engineering (SE) is one of the important courses to address real-life problem solving and 
improve the poor quality of software [1-3]. It provides great solution to decrease the complexity of the 
project, manage project time and budget, and to ensure that the project developed systematically, measurable, 

and within specification [2], Software engineering helps to decrease the time to human daily tasks by 
providing the user to press single button only to finish multiple jobs at one time. Furthermore, all the IT areas 
including graphics, networking and computer maintenance need software to assists the user in completing 
jobs [4-6], 

However, many students in SE still unable to graduate successfully and fail to meet the industry 
demands [7]. Many aspects influence this student’s performance such as learning resources, learning 
environment, teaching and learning process in the classroom and academic background [8]. The assessment 

and qualification aspects of student performance in SE can be different in every country depending on the 
education system applied in that country. Consequently, there is a need to conduct research to predict the 
performance of the students (pass or fail) before they register in SE course. One of the techniques to assess 

student's performance through the e-learning system is using meaningful learning concept [9], However, this 
research is focused on the prediction of student performance using Adaboost and Multilayer Perceptron on 
machine learning implementation [10-13]. 
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The aim of this research is to propose Adaboost-MLP to predict the SE student’s performance. The 
contributions of this study are as follows: 

- In order to develop an outstanding predictive performance in machine learning, this paper utilized real 
and genuine case data from the SE students from Universiti Malaysia Pahang (UMP) [14], With this real 
case data, this study able to train machine learning and develop a good prediction model. 

- To investigates multiple features in SE students, for instance, Malaysian University English Test 
(MUET’s) results, entry qualifications, gender and status (graduate or fail). 

- This study used adaboost, a type of boost that converts the multilayer perceptron into a strong learner for 
efficient machine learning results. 

This paper organized as follows. Section 2 is a literature review to discuss, analyze make a 
comparison between existing works and this proposed research. Section 3 is to illustrate the methodology in 
the experiment. The results derived from the experiment are given in Section 4. Finally, Section 5 delivers a 
conclusion from the results. 


2. LITERATURE REVIEW 

This section explains some aspects that influence student performance in Software Engineering (SE) 
course. Furthermore, it also discusses machine learning information in this paper. Last but not least, this 
chapter will explain about related work for this research. 

2.1. Aspects that influence student performance 

In the Software Engineering course, there are three aspects that influence student performance. The 
first aspect related to the subject course. The history of subject courses that already taken by student one of 
the important aspects because this will be a prerequisite skill and knowledge that need to prepare by students 
before they take SE course. The second aspect is about the student. This is important because every student 
has their own skill to understand and solve a problem on every course they take. And the third aspect is about 
the teacher. The teacher is one of the important objects that implement a learning process in the classroom. 

2.1.1. Subject course 

In order for the students to excel successfully in SE course, they need to learn the basics of 
programming. For instance, the basic syntax, structure, and style gradually. One of the crucial parts of 
programming is to interpret the algorithm into code. If the algorithm is correct, then it will produce the right 
program and achieve the required objectives. Therefore, many students tend to fail. The students who fail in 
the examination need to repeat this subject. This is because SE subject is the prerequisite for the next 
programming subject which enquires higher skill levels than the subject before [15, 16]. 

2.1.2. Student 

Student’s interest plays an important role in learning the programming subjects. Otherwise, it will be 
difficult if they uninterested in SE subjects. Furthermore, different students have different motivations. 
Learning style’s also different for each student. Certain students are more to organize discussions in a group 
while some other students prefer to study alone [15, 17]. 

2.1.3. Teacher 

Teachers play an important role in delivering the knowledge efficiently to the students (Gomes and 
Mendes, 2007). Teachers are responsible to make clear explained to the students and suggest to them solution 
for student’s problems. Teachers should have expertise in controlling class situations as this can affect 
effective learning to students [15, 18]. 

2.2. Machine learning technology 

Machine learning emerged from the computer that has the ability to “learn” some specific task to 
solve a real-life problem. The aim of machine learning solution is to create some prediction and take a 
decision for some future improvement [19]. Machine learning technology has been applied in many domains 
and areas. There are some existing researches that used machine learning implementation in the prediction of 
student performance [20-22]. However, this research is focused on how to predict student performance using 
enhanced Multilayer Perceptron (MLP) machine learning classification with Adaboost. 

Boosting is a method to improve the machine learning algorithm to gain outstanding accuracy. 
Adaboost, which is also knowns as Adaptive boosting, is one of the boosting method that introduced in year 
of 1995 by Freund and Schapire. Compare to other previous boosting methods, Adaboost has resolved 
multiple cases in practical difficulties. In order to boost and produced precise classification result, it learn of 
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the weak algorithm and repeatedly multiple times in series of round until gain the good score. In each time it 
repeats, it generates new weak prediction rule until satisfy the possible outcome. 

Meanwhile, this paper used Multilayer Perceptron (MLP) in prediction, as MLP is one of the famous 
algorithms in Artificial Neural Network (ANN) category. It has contributed a good result in many research 
areas. MLP is a neural network that inspired from a neuron of a human brain, which consists of set of outputs 
from a set of inputs, in multiple layers. Therefore, this research is to discover the combination of MLP and 
Adaboost algorithm to improve the prediction of chemical student either graduate or fail. 

2.3. Analysis and comparison of existing works 

This section also compares and explains the previous and our propose studies in investigating the 
student’s performance in software engineering. Table 1 summarizes the related work done for a student’s 
performance evaluation. It shows that the two studies use statistical methods to evaluate the student’s 
performance while two studies including ours use machine learning (ML). Moreover, the previous study used 
Random Forest (RF) without Adaboost, while this study proposed Adaboost with MLP. In addition, we used 
new fresh dataset from Universiti Malaysia Pahang [14] which is the real case study. The dataset includes the 
SE students that registered in 2013 and graduated (including fail) in 2017. As this paper used the real case 
study from UMP, therefore this study is able to contribute to predicting the SE performance in UMP, which 
contributes to public institutions in Malaysia as well. 


Related Works 


[ 23 , 24 ] 

[ 25 ] 


[ 26 ] 


Current work 


Table 1. Comparison with previous and propose studies 


Research titles 

Impact of Pre-University Factors on 

the Motivation and Performance of 

Undergraduate Students in Software 

Engineering 

Major problems in basic 

programming that influence student 

performance 

Using the Random Forest Classifier 
to Assess and Predict Student 
Learning of Software Engineering 
Teamwork 

To Predict the Student’s 
Performance of Software 
Engineering by Using Machine 
Learning 


Domains 

To evaluate the impact of factors prior 
to university on the performance and 
motivation of undergraduate freshmen 
students in Software Engineering 

To identify problems and causes faced 
by programming students 

Develop effective machine-leaming- 
based methods for assessment and 
early prediction of student learning 
effectiveness in software Engineering 
teamwork. 

Detect the performance of students in 
Software Engineering Course 


Methods 


Statistical analysis. 

Statistical Package for the Social 
Science (SPSS) Software version 
19 . 

Random Forest (RF) machine 
learning (ML) 

Used Natural Inspired Machine 
Learning (Adaboost-MLP) to 
detect performance (fail or 
graduated) 


3. METHOD 

This section discussed the methodology of the undertaken study. Figure 1 depicts the methodology 
in this study. Initially, we acquired the dataset of software engineering students from the academic center of 
UMP to implement this study. In order to filter the dataset, we utilized R-Studio. Meanwhile, Waikato 
Environment for Knowledge Analysis (WEKA) is used to perform the experiments and evaluations. Then we 
show a timeline of activities during this study and the software and hardware used in this study. 



What are the features that determining the students will excel or 
fail in software engineering course? 

How to predict the students will graduate or fail in the 
software engineering on the future? 

Gaining dataset from software engineering students that 
had graduated and failed from Universiti Malaysia 
Pahang 

Cleansing the data and extracting 
the features from the dataset 


Apply MLP machine learning 
classification with Adaboost 


Perform evaluation 


Figure 1. Methodology 
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3.1. Understanding the problem 

In this modem era, the country’s future is dependent on information technology (IT). However, the 
young generation in our country is an individual who will drive the country's technology in the future. 
Software education is a key area for technological development. Although, the performance of Software 
Engineering students in programming subjects can be seen decreasing with a high ratio [24]. Based on the 
journal from UKM [25], students that are weak in the programming subject have low-level confidence to 
finish individual tasks and they depend on the help of other students. These students also have very minimum 
initiative, especially those who are moderate and weak in the performance. They always depend on other 
sources such as answer scheme, the assistance from the lecturers and friends to help them solve the tasks. In 
another point of view, many students unable to discover that SE course is suitable for them and they can 
graduate successfully in SE. These problem statements derived various questions. Therefore, the research 
questions of this study are as follows: 

- What are the features that determine the students who will graduate or fail in software engineering 
course? For instance: previous MUET results, entry qualification from matriculation, Malaysian Higher 
School Certificate (Sijil Tinggi Persekolahan Malaysia, STPM), Diploma? 

- How to discover the students will graduate or fail in software engineering in the future? 

3.2. Getting raw data 

Aforementioned that we collected the dataset from the academic center of UMP [14], which is a real 
case and genuine data consists results of software engineering students. This dataset consists the information 
about MUET’s results, entry qualifications, gender, and status. It is important to note that this study is the 
first research that scrutinized this dataset. As this study is to predict the performance of SE students, therefore 
there is a need to utilize a decent data from the SE which indicates a real situation. 

3.3. Feature engineering 

In features engineering, this research needs to clean the data as it comes from raw form [27]. The 
cleansing data involves deleting information that ambiguous, useless, incomplete and missing. For instance, 
state to whom the student belongs to, the age of the student, and contact number. Once this step is done, we 
were able to extract the features and provides many discoveries in a student’s status (graduate and fail). 


Table 2. MUET Bands with users level 
Band 1 2 3 4 5 6 

User Very Limited Limited Modest Satisfactory Proficient Highly Proficient 


Status / Muet’s Result Muet’s Result 



4 S 6 


Figure 2. Feature analysis between status and MUET's results 
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Figure 2 shows the combination between status (graduate and fail) and MUET’s result. MUET is an 
English skill test administered by the Malaysian Examination Council (MEC) [28, 29]. Table 2 portrays the 
MUET bands with their user’s level. Band 1 represents those students who are very limited in English; Band 
2 represents those who limited to use English language. Similarly, Band 3 represents modest students, while 
Band 4 shows the satisfactory one. Band 5 is the second last band in this table; it shows the English proficient 
students. However, Band 6 indicates that the students were very fluently in practicing English language, they 
are considered highly proficient. Based on the graph, it depicts that most of the students that obtained band 3 
in MUET result (modest English user) were able to graduate successfully in software engineering. On the 
other hand, the lowest number of MUET result (band 6-highly proficient user) was the fewest student 
graduate in software engineering. This indicates that the students are able to graduate in software engineering 
even though they were modest in English. 


Entry Qualification / Gender 

DIPLOMA I MATRICULATION I STPI\ 

3AO 



| GRADUATED 


Figure 3. Feature analysis between status, entry qualification and gender 


The combination features between status (graduate or fail), entry qualification (Diploma, 
Matriculation and STPM) and gender (male or female) shown in Figure 3. It shows that 991 students in total 
got enrolled in SE. Figure 3 depicts that most of the students get register in SE on the base of matriculation 
while STPM also known as Malaysian Higher School Certificate is on second and diploma is on third place. 
It shows that more than half of the students got enrolled in the base of matriculation including 330 female 
students while 210 male students. However, less than half in total got register on the base of remaining two 
entry qualification including 255 students have STPM while 203 had diploma. This derived assumption that 
students from matriculation may have a passion for this course or following their friends to register for 
this course. 

On the other hand, in the gender aspect, the number of female students who got registered on the 
base of Diploma and Matriculation are higher than male students. However, the enrollment of male students 
is higher in case of STPM entry qualification. The number of failed and graduated students is shown with red 
and blue bar graph. In the Diploma and Matriculation, the number of female students who failed in SE is 
higher than the male students, which is 36 in Matriculation and 11 in Diploma rather than 26 male students 
failed in Matriculation and 6 in Diploma. Furthermore, these graphs show that female who has Diploma and 
Matriculation are more interested in SE as compare to female from STPM, whereas the number of females 
from STPM is only 95, while the male is 119. In conclusion, the number of female students is more interested 
in SE course as compared to male students. However, it also summed up with a higher number of female 
students failed in SE as compared to male students. 
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3.4. Train and testing the model 

In the interest to discover the performance of the predictions, the predictive dataset needs to be 
trained before utilizing the algorithms. A few test options provide by WEKA such as Cross-validation, 
supplied training set. Use training set. Split percentage. This study used cross-validation testing as training 
and testing sets are various parts and data in testing part is excluded from the training set. Cross-validation is 
used to overcome the problem of overfitting and makes the predictions more general. 

The dataset will be trained with Adaboost to boost the MLP algorithm (Adaboost-MLP) to discover 
how accurate the predictive model is. The predictive train data is generalized to get a more widely applicable 
classifier. In this study we used k-fold cross-validation. In this study, we used 10-fold cross-validation while 
the data is divided in 10 times. In order to get precise accuracy k-1 folds are used for training while one-fold 
is used for testing. 

3.5. Evaluation 

This study evaluates the Adaboost-MLP performance in predicting the software engineering 
students in accuracy benchmark, which shows the percentage of the correctness in classifying the 
performance either graduate or fail. 


4. RESULTS 

This section discussed the results from the Adaboost-MLP prediction in SE performance either 
graduate or fail. The results in accurateness are 87.76 percent in classifying the performance either graduate 
or fail. With this Adaboost-MLP model, this study able to predict the students before they register to enroll 
SE course in Universiti Malaysia Pahang or any institution. If the outcome of the prediction result is failed, 
we advised the student to concentrate and pay more attention to SE course or change any course that 
interests them. 


5. CONCLUSION 

In a nutshell, the objective of this study was achieved which is to predict the students in the 
Software Engineering course in the future either quit or graduate by using enhanced Multilayer Perceptron 
(MLP) machine learning classification with Adaboost. This study achieved 87.76 percent accuracy in 
classifying the performance of software engineering students. This study is to predict the student’s 
performance in future, so they would able to reconsider their decision before registering the software 
engineering course. For the predicted student that will fail, we suggest advising the students to register 
another course that more suitable for them instead wasting their 4 years’ time studying in software 
engineering. Lastly, we may encourage the students that would excellence in software engineering to register 
the course. 
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